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1.  Introduction 

The  methods  that  are  currently  most  popular  for  solving  smooth  linear¬ 
ly  constrained  optimization  problems  of  the  form 

minimize  J(x)  (1) 

subject  to  Ax  <_  b, 

where  J:  Rn  -*  R,  A  :  mxn,  beRm,  are  based  on  solution  of  some  type  of  linear 
or  quadratic  programming  subproblems.  For  example  methods  stemming  from 
the  original  proposals  of  Goldstein  [1],  and  Levitin  and  Poljak  [2]  take 
the  form 


Vi  '  \  *  VW  <2> 

where  x^  solves 

minimize  VJ(xk)  '  (x-xk)  +  j  (x-x^  'H^x-x^  (3) 

subject  to  Ax  <  b, 

is  a  positive  definite  matrix,  and  is  a  positive  scalar  stepsize 

determined  according  to  some  rule.  This  method  is  capable  of  superlinear 

2 

convergence  if  is  either  the  Hessian  matrix  7  J  or  some  suitable  Quasi- 

2 

Newton  approximation  of  V  J  [2]-[4].  However,  for  large-dimensional  problems 
the  overhead  for  solving  problem  (3)  is  typically  prohibitive  with  such  a 
choice  of  H^  thereby  rendering  the  method  impractical. 

The  difficulty  with  excessive  overhead  in  solving  the  quadratic  program¬ 
ming  problem  (3)  can  be  bypassed  in  at  least  two  ways  if  the  constraint  set 
has  a  simple  form  (for  instance  upper  and  lower  bounds  on  the  coordinates  of 
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x,  Cartesian  products  of  simplices,  etc.),  or  has  special  structure  (for 
example  it  expresses  conservation  of  flow  equations  for  the  nodes  of  a 
directed  graph).  One  possibility  is  to  take  =  0  in  problem  (3)  so  that  (3) 
becomes  a  linear  program.  This  leads  to  methods  of  the  Frank-Wolfe  type 
[5]  which  has  been  extensively  applied  for  solution  of  multicommodity  net¬ 
work  flow  problems  [6], [8],  The  rate  of  convergence  of  these  methods  is  sublirear 
[9],  [10]  and  therefore  too  slow  for  applications  where  high  solution  accuracy 
is  demanded.  The  other  possibility  is  to  choose  the  matrix  in  (3)  to  be  positive 
definite  and  diagonal.  With  such  a  choice  it  is  often  possible  to  solve 
the  quadratic  subproblem  (3)  very  efficiently  by  exploiting  the  simple 
structure  of  the  constraint  set.  Methods  of  this  type  have  a  long  and 
quite  successful  history  in  large-scale  problems  arising  in  network  flow 
applications  [7],  [11]- [17]  as  well  as  in  other  areas  such  as  optimal  control 
[18],  [19],  However  their  rate  of  convergence  is  typically  linear  and  in 
many  applications  unacceptably  slow. 

A  somewhat  different  type  of  method  stems  from  the  original  gradient 
projection  proposal  of  Rosen  [20],  and  other  related  proposals  (the  reduced 
gradient  method  and  the  convex  simplex  method  [21]  etc.).  The  typical 
iteration  in  these  methods  proceeds  along  a  linear  manifold  of  active  con¬ 
straints  which  is  gradually  modified  during  the  algorithm  as  previously 
active  constraints  become  inactive  and  new  constraints  become  active  (see 
[22] — [ 25] ) .  These  methods  are  quite  effective  for  problems  of  smalll 
dimension  and  have  also  been  applied  in  some  network  flow  problems  [26], 

[27],  but,  in  our  view,  are  highly  unsuitable  for  large  problems  with  many 
constraints.  The  main  reason  is  that  they  typically  allow  only  one  new 
constraint  to  become  active  in  any  one  iteration.  So  if  for  example  one 
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thousand  constraints  are  active  at  the  solution  which  are  not  active  at  the 
starting  point,  these  methods  require  at  least  one  thousand  iterations  and 
likely  many  more  in  order  to  converge. 

In  this  paper  we  consider  a  projected  Newton  method  first  proposed  in 
Bertsekas  [28]  that  offers  substantial  and  often  decisive  advantages  over 
the  methods  described  above  for  large  problems  with  many  simple  constraints 
as  typified  by  a  multicommodity  flow  structure.  For  the  problem 


minimize  J(x) 
subject  to  x  >  0 


it  takes  the  simple  form 


Vi  s  [xk  -  (5) 

where  is  a  positive  scalar  stepsize,  is  a  positive  definite  symmetric 
matrix  which  is  diagonal  with  respect  to  some  of  the  coordinates  of  x,  and 
[•]  denotes  projection  (with  respect  to  the  standard  norm)  on  the  positive 
orthant.  It  is  shown  in  [28]  that  can  be  chosen  on  the  basis  of  second 
derivatives  of  J  so  that  the  method  typically  converges  superlinearly. 

Iteration  (5)  constitutes  the  basic  building  block  for  extensions  to 
more  general  inequality  constrained  problems  by  means  of  a  procedure  described 
in  [28].  In  this  paper  we  focus  on  the  case  where  the  constraint  set  is  a 
Cartesian  product  of  simplices,  and  consider  in  more  detail  a  class  of  non¬ 
linear  multicommodity  flow  problems  characterized  by  a  constraint  set  of 
this  type.  We  descrihe  an  approximate  version  of  a  Newton-like  method 
based  on  approximate  solution  of  the  Newton  system  of  equations  via  the 


•  "v- . 
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conjugate  gradient  method.  It  turns  out  that  for  network  flow  problems 
this  conjugate  gradient  method  can  be  implemented  very  efficiently  by  net¬ 
work  type  operations--a  fact  also  observed  earlier  in  a  different  context 
by  Dembo  [29].  As  a  result  a  significant  advantage  in  speed  of  convergence 
is  gained  over  earlier  methods  at  the  expense  of  relatively  small  additional 
overhead  per  iteration. 

The  notation  employed  throughout  the  paper  is  as  follows.  All  vectors 

are  considered  to  be  column  vectors.  A  prime  denotes  transposition.  The 

n  In 

standard  norm  in  K  is  denoted  by  |*|,  i.e.  for  x  =  (x  ,...,x  J  we  write 

[x |  =  [ J  (x1)2]172-  The  gradient  and  Hessian  of  a  function  f:  Rn  -►  R 

are  denoted  by  Vf  and  ^f  respectively.  All  vector  inequalities  aTe  meant 

to  be  componentwise  (for  example  x  >  0  means  x1  >  0,  i  =  l,...,n). 

2.  A  Projected  Newton  Method  for  Minimizing  a  Twice  Differentiable  Function 
on  a  Simplex 

Consider  the  problem 
minimize  J(x) 

n 

subject  to  x  ;»  0,  J  x  =  r  (6) 

i=l 

where  J:  Rn-*-R  is  twice  continuously  differentiable  and  r  is  a  given  positive 
scalar.  We  also  assume  for  convenience  that  J  is  convex  although  general¬ 
izations  of  all  the  results  and  algorithms  of  this  paper  are  possible  without 
this  assumption. 

We  describe  the  kth  iteration  of  a  Newton- like  method  for  solving  (6).  At  the 
beginning  of  the  iteration  we  have  a  feasible  vector  x^.  The  next  (feasible) 
vector  x^+j  is  obtained  by  means  of  the  following  procedure: 
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By  rearranging  indices  if  necessary  assume  that  the  last  coordinate 

x?1  satisfies 
k 

x£  =  max{x*  |  i  =  l,...n}.  (7) 

Consider  a  reduced  coordinate  system  in  the  vector  yeRn'*  given  by 


,  1  n-1.  ,  1  2  n-l. 

y  =  (y . y  )  =  (x  ,x  . ,x  ) , 


(8) 


denote  y^  =  (x^,...,x^  *) ,  and  consider  the  reduced  objective  function 


n-1 


h  (y)  =  J(y  ,...,yn"  ,  r  -  \  y1). 

K  i=l 


(9) 


Based  on  this  transformation  problem  (6)  is  equivalent  locally  (around  y^) 
to  the  problem 


minimize  h^(y)  (10) 

y  >  0 

n-i  A 

in  the  sense  that  the  constraint  r  -  \  y  >  0  is  (by  construction) 

i=l 

inactive  within  a  neighborhood  of  y^.  The  following  iteration  is  based 
on  this  fact  [compare  with  (4), (5)].  For  an  (n-l)x(n-l)  positive  definite 
symmetric  matrix  to  be  further  specified  later  denote 

yk(a)  =  [yk  -  aWy^*’  Va>  0  (11) 

where  [•]  denotes  projection  on  the  positive  orthant  [i.e.  for  a  vector 
y  =  (y1,...,y11  1),  the  vector  [y]+  has  coordinates  max{0,yi),  i  =  l,...,n-l). 
Define  the  vector  yk+^  by  means  of 

yk+l  =  yk(ak^ 


(12) 
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where  the  stepsize  is  chosen  by  means  of  a  rule  to  be  specified  further 
later  from  the  range 


o^elO.a^] 


with  ctj.  given  by 

_  n-1 

OL  =  sup  {a  |  l 
K  i=l 


yj(a) 


<  r}. 


(13) 


(14) 


[Note  that  in  view  of  (7),  (8),  (11),  we  have  >  0  or  =<*>].  The  next 
vector  x^+1  generated  by  the  algorithm  has  coordinates  given  by 


i  =  1. 


(15a) 


Xk+1 


r 


n-1 


l 

i=l 


(15b) 


We  first  note  that,  in  view  of  (11),  (13),  (14)  the  vector  x^+1  is 
feasible.  The  following  proposition  identifies  a  class  of  matrices  for 
which  a  descent  iteration  is  obtained. 

Denote 

3h,  (y.) 

lk(xk)  =  (i  |  yk  =  0,  -krK-  >  0}  (16) 

9y 


and  consider  for  all  a  >  0  the  vector  x^a)  with  coordinates  given  by 


xk(a)  =  y£(ai) ,  i  =  1, . . .  ,n-l 


n-1 


x£(a)  =  r  -  l  yj(a). 


(17) 

(18) 
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Proposition  1 :  Assume  that  the  positive  definite  symmetric  matrix  DR  is 
diagonal  with  respect  to  the  index  set  IR(xR)  in  the  sense  that  the  elements 
Dj^  of  DR  satisfy 

=  0 
k 

for  all  ieI*(xR)  and  j  =  l,...,n  with  i  t  j . 

a)  If  xR  is  a  global  minimum  of  problem  (6)  then 

xR(a)  =  xk,  Va>0 

b)  If  xR  is  not  a  global  minimum  of  problem  (6)  then  there  exists  a  efO.a^] 
such  that  for  all  ae(0,a]  the  vector  xR(a)  is  feasible,  and 

f[xR(a)]  <  f(xR),  V  ae(0,a] .  (19) 

The  proposition  above  shows  that  the  algorithm  essentially  terminates 
at  a  global  minimum  and  is  capable  of  descent  when  not  at  a  global  minimum. 

There  are  a  number  of  issues  relating  to  selection  of  the  matrix  DR 
and  the  stepsize  aR  and  associated  questions  of  convergence  and  rate  of 
convergence  which  are  addressed  in  [28]  and  will  only  be  summarized  here. 

We  first  mention  that  the  convergence  results  availabe  require  that  DR  is 
not  only  diagonal  with  respect  to  the  set  IR(xR)  but  rather  with  respect 
to  the  possibly  larger  set 

+  i  i 

I  =  (i  |  0  <  yR  <  cj,  -  —  >  0}  (20) 

3y 

where 

eR  =  min{e,  sR)  (21) 
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l  . 


e  is  a  fixed  positive  scalar,  is  given  by 

;  i  i  ;  3h,  (y.) 

Sk  =  lyk  '  lyk  -  \  —i -  J  I 


and  are  scalar  sequences  such  that 

p£  _>  p1  >  0,  k  =  0,1,... 


(22) 


with  p1  being  some  positive  scalars  which  are  fixed  throughout  the  algorithm. 
This  is  an  antizigzagging  device  of  the  type  commonly  employed  in  feasible 
direction  methods  (see  e.g.  [30]),  and  is  designed  to  counteract  the  possible 
discontinuity  exhibited  by  the  set  I*(x^)  as  x^  approaches  the  boundary  of 
the  positive  orthant.  (Actually  the  formula  used  in  [28]  is  slightly  dif¬ 
ferent  than  (22)  but  this  difference  does  not  affect  the  convergence  and 
rate  of  convergence  results  of  [28]). 

Regarding  the  choice  of  the  stepsize  o^.  there  are  at  least  two  practical 
methods  that  lead  to  algorithms  which  are  demonstrably  convergent.  In  the 
first  method  ct^  is  chosen  according  to 

=  min  (a,  a^}  (23) 


where  a  is  a  fixed  positive  constant  and  is  given  by  (14).  In  the 
second  method  an  initial  stepsize  is  chosen  and  is  successively  reduced  by 
a  certain  factor  until  a  "sufficient"  reduction  (according  to  an  Armijo- 
like  test)  of  the  objective  function  is  observed  [28],  Under  further  mild 
assumptions  it  is  possible  to  show  that  all  limit  points  of  sequences  gen¬ 
erated  by  the  algorithm  are  global  minima  of  problem  (6).  Furthermore 
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after  some  index  the  sets  1^  are  equal  to  both  I*(x^)  and  the  set  of  indices 
of  coordinates  of  y^_  that  are  zero  at  the  limit  point.  This  last  property 
is  instrumental  in  constructing  superlinearly  convergent  algorithms  as  it 
shows  that  the  portion  of  the  matrix  which  must  be  "diagonalized"  plays 
no  role  near  the  end  of  the  algorithm.  As  a  result  superlinear  convergence 
can  be  achieved  by  choosing  the  portion  of  the  matrix  D^_  that  corresponds 
to  the  indices  not  in  1^  to  be  equal  to  the  inverse  Hessian  of  h^  with 
respect  to  these  indices.  The  kth  iteration  of  the  resulting  algorithm 
can  be  restated  as  follows: 

First  the  set  I*  is  calculated  according  to  (20) -(22)  on  the  basis  of 
the  gradient  Vh^.  Then  the  vector  y  is  partitioned  as  in 


y  = 


(24) 


where  y  is  the  vector  of  coordinates  y1  with  iel*  and  y  is  the  vector  of 
coordinates  y1  with  ij^  .  Then  a  "search  direction"  d^  =  (d^.d^)  is 
obtained  solving  the  systems  of  equations 


V 


"8k 


(25) 


V  =  '8k  <26> 

9W 

where  g,  (or  g,  )  is  the  vector  with  coordinates  - ; -  with  iel, 

+  ~  ay1 

(respectively  i^I^) ,  is  a  diagonal  positive  definite  matrix,  and  is 
a  symmetric  positive  definite  matrix  which  is  equal  to  the  Hessian  of  h^ 
with  respect  to  the  coordinates  y1,  .  The  vector  y^+j 

by 


is  then  obtained 
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>w  ■  t^k  *  “k  -V*  <27’ 

where  otj.  is  the  stepsize  obtained  according  to  one  of  the  rules  mentioned 
earlier. 

We  wish  to  call  the  reader's  attention  to  the  natural  decomposition 
of  the  iteration  into  three  phases:  The  formation  of  the  index  set  I*, 
the  computation  of  the  "search  direction"  d^,  and  the  determination  of  the 
stepsize  a^.  There  is  considerable  freedom  for  variations  in  each  phase 
independently  of  what  is  done  in  other  phases  while  still  maintaining 
desirable  convergence  and  rate  of  convergence  properties. 

Approximate  Implementation  via  the  Conjugate  Gradient  Method 

Finding  the  "search  direction"  d^  requires  the  solution  of  the  linear 
system  of  equations  (26).  Solution  of  this  system  can  be  accomplished,  of 
course,  by  a  finite  method  involving  triangular  factorization  but  when  the 
dimension  of  this  system  is  large,  as  for  example  in  multicommodity  flow 
problems,  the  associated  computational  overhead  can  make  the  overall  algo¬ 
rithm  impractical.  The  alternative  is  to  solve  this  system  iteratively 
by,  for  example,  a  successive  overrelaxation  method  or  a  conjugate  gradient 
method.  This  approach  is  practiced  widely  by  numerical  analysts  [31]  and 
its  success  hinges  upon  the  ability  of  the  iterative  method  to  yield  a 
good  approximation  of  the  solution  of  system  (26)  within  a  few  iterations. 
In  order  to  guarantee  convergence  of  the  overall  optimization  algorithm 
it  is  necessary  that  the  approximate  solution,  call  it  z,  of  the  system 
(26)  satisfies 


z'gk  <  0 


(28) 


-11- 


whenever  f  0,  in  order  to  make  possible  a  reduction  in  the  objective 
function  value  [cf.  Proposition  lb)].  This  is  the  minimal  requirement  that 
we  impose  upon  the  iterative  method  used  to  solve  (26). 

In  this  paper  we  are  primarily  interested  in  approximate  solution  of 
the  system 

\  z  =  -V  f29) 


or  equivalently  the  unconstrained  minimization  problem 


min  gk  2  +  \  z,Hkz 
z 


(30) 


by  means  of  the  following  scaled  version  of  the  conjugate  gradient  method: 

A  positive  definite  symmetric  matrix  is  chosen,  and  a  sequence 
(z^jis  generated  according  to  the  iteration 


=  °*  Vl  =  zm  +  Y«V  m=0’1’** 


(31) 


where  the  conjugage  direction  sequence  (pm)  is  given  recursively  by 


P0  =  -Skr0'  P„ 


-S,  r  +  B  p  .,  m  =  1,2,..., 
km  m  m- 1 


(32) 


the  residual  sequence  {r^}  is  defined  by 


r  =  H.  z  +  g,  ,  m  =  0,l,., 
m  k  m  &k 


(33) 


and  the  scalars  Ym  and  B^  are  given  by 

,  m—  0,1,... 


r'  S.  r 
m  k  m 


p'  H,  p 
*m  k 


(34) 
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r< 

m 


S.r 
k  m 


m- 


.S.r 

lkm- 


m  1  ,  2  ,  .  * . 


(35) 


As  is  well  known  ([25],  [32])  this  method  will  find  the  solution  d^ 

of  system  (29)  in  at  most  (n-1)  steps  (i.e.,  d,  =  z  )  regardless  of  the 

k  n — a 

choice  of  S^.  We  are  primarily  interested  however  in  approximate  implementations 

whereby  only  a  few  conjugate  gradient  iterations  of  the  method  are  performed 

and  under  these  circumstances  the  choice  of  S,  can  have  a  substantial  effect 

k 

on  the  quality  of  the  final  approximate  solution.  A  suitable  choice  for  many 
problems  (and  the  one  we  prefer  for  multicommodity  flow  problems)  is  to  choose 
to  be  diagonal  with  elements  along  the  diagonal  equal  to  the  second  deriv¬ 
atives  of  the  hfc  with  respect  to  the  corresponding  coordinates  y1,  i£l* 
evaluated  at  y^.  There  are  however  other  attractive  possibilities  depending 
on  problem  structure  (see  [33]). 

It  is  easily  verified  that  if  g^  t  0,  then  we  have 

zm  h  <  °»  V  m  =  1,2,... 

so,  regardless  of  how  many  conjugate  gradient  iterations  are  performed,  the 
final  approximate  solution  z  of  system  (29)  will  satisfy  the  descent  condition 
(28). 

We  finally  mention  that  the  assumption  that  be  positive  definite  is 
not  strictly  necessary  for  the  preceding  algorithm  to  generate  a  descent 
direction.  It  is  sufficient  that  g^  +  0  and  be  such  that  the  quadratic 
•ptimization  problem  (30)  have  at  least  one  globally  optimal  solution.  It 
turns  out  that  this  minor  refinement  is  significant  for  the  multicommodity 
flow  problems  to  be  considered  in  the  next  section. 
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Extension  to  the  Case  where  the  Constraint  Set  is  a  Cartesian  Product  of 
Simplices  '  -  -  -  - 

Consider  the  problem 


minimize  J[x(l) , . . . ,x(m)] 

n. 

J  i 

subject  to  x(j)  >0,  l  x  (j)  =  r . ,  j  =  1,.., 

i=l  J 


(37) 


,m 


n. 


where  each  x(j),  j  =  l,...,m  is  a  vector  in  R  the  function 
n1+. . . +n 

J  '•  R  ^  R  is  twice  continuously  differentiable  and  r  ^ ,  j  =  1, _ ,m 

are  given  positive  scalars. 

The  extension  of  the  method  described  earlier  in  this  section  to  handle 
problem  (37)  is  evident  once  it  is  realized  that  one  can  similarly  pass  to 
a  reduced  coordinate  system  of  dimensioning. .  ,+n^-m)  while  in  the  process 


J 


eliminating  the  m  equality  constraints  J  xX(j)  *  rfj),  j  =  l,...,m,  [cf. 


i=l 

(8),  (15)].  One  then  obtains  a  reduced  problem  involving  nonnegativity  con¬ 
straints  only  [cf.  (9),  (10)]  which  is  locally  (around  the  current  iterate) 
equivalent  to  problem  (37) .  The  iteration  described  earlier,  including  the 
conjugate  gradient  approximation  process,  is  fully  applicable  to  the  reduced 
problem. 


3.  Optimization  of  Multicommodity  Flows 

We  consider  a  network  consisting  of  N  nodes  1,2,...,N  and  a  set  of 
directed  links  denoted  by  L.  We  denote  by  (i , £)  the  link  from  node  i  to 
node  l,  and  assume  that  the  network  is  connected  in  the  sense  that  for  any 
two  nodes  m,n  there  is  a  directed  path  from  m  to  n.  We  are  given  a  set  W 
of  ordered  node  pairs  referred  to  as  origin-destination  (or  OD)  pairs.  For 
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each  OD  pair  weW,  we  are  given  a  set  of  directed  paths  Pw  that  begin  at 
the  origin  node  and  terminate  at  the  destination  node.  For  each  weW  we 
are  also  given  a  positive  scalar  rw  referred  to  as  the  input  of  OD  pair  w, 
and  this  input  must  be  optimally  divided  among  the  paths  in  P^  so  as  to 
minimize  a  certain  objective  function. 

For  every  path  peP^  corresponding  to  an  OD  pair  weW,  we  denote  by  xP 
the  flow  travelling  on  p.  These  flows  must  satisfy 

£  x15  =  r  ,  VweW  (38) 

peP  W 

r  w 


X 


p 


>  0, 


v  pePw,  weW. 


(39) 


Equations  (38),  (39)  define  the  constraint  set  of  the  optimization  problem-- 
a  Cartesian  product  of  simplices. 

To  every  set  of  path  flows  {xP  |  pePw>  weW}  satisfying  (38),  (39)  there 

corresponds  a  flow  f . .  for  every  link  (i,£).  It  is  defined  by  the  relation 
X  v 

f.{  =  y.  I  6n(i,Jl)xP,  V(i,£)eL  (40) 

*  weW  peP  p 
w 

where  6^ (i , fi.)  =  1  if  the  path  p  contains  the  link  (i,£)  and  6p(i,£)  =  0 
otherwise.  If  we  denote  by  x  and  f  the  vectors  of  path  flows  and  link 
flows  respectively  we  can  write  relation  (40)  as 

f  =  Ex  (41) 


where  E  is  the  arc-chain  matrix  of  the  network. 

For  each  link  (i,£)  we  are  given  a  convex  twice  continuously  differ¬ 
entiable  scalar  function  with  strictly  positive  second  derivative 
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for  all  fU  >  °-  The  objective  function  is  given  by 


D‘f>  ■  -  l  Du‘V- 


(i,Jl)eL 


(42) 


By  using  (41)  we  can  write  the  problem  in  terms  of  the  path  flow  variables 
xP  as 


minimize 
subject  to 


J(x)  =  D (Ex) 

l  xp  =  r  ,  v  weW 

peP 
r  w 

xp  0,  v  PePw>  weW* 


(43) 


In  communication  network  applications  the  function  D  may  express,  for 
example,  average  delay  per  message  [6],  [11]  or  a  flow  control  objective 
[34] ,  while  in  transportation  networks  it  may  arise  via  a  user  or  system 
optimization  principle  formulation  [16],  117],  [35].  The  algorithm  to  be 
presented  admits  an  extension  to  the  case  where  the  function  D  does  not 
have  the  separable  form  (42) ,  but  we  prefer  to  concentrate  on  the  simpler 
and  practically  important  separable  case  in  order  to  avoid  further  com¬ 
plications  in  our  notation. 

Clearly  problem  (42)  falls  within  the  framework  of  the  previous  section 
and  the  approximate  version  of  the  projected  Newton  method  described  there 
can  be  applied  for  its  solution.  A  key  element  for  the  success  of  this 
algorithm  lies  in  that  the  conjugate  gradient  iterations  required  for 
approximate  solution  of  the  corresponding  system  of  equations  can  be  carried 
out  very  efficiently.  This  in  turn  hinges  on  the  fact  that  the  product  of 
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the  matrix  with  various  vectors,  which  is  needed  for  the  computation  of 
the  residual  rm  in  (33)  and  the  stepsize  ym  in  (34),  can  be  computed  by 
graph  type  operations  without  explicitly  computing  or  storing  the  matrix  H^. 

We  now  describe  the  kth  iteration  of  the  algorithm  whereby  given  a 
feasible  vector  of  path  flows  x^  we  find  the  next  vector  x^+j: 

Phase  1 :  (Determination  of  the  reduced  coordinate  system  and  the  set  I*) . 
For  each  weW  let  p^  be  the  path  carrying  maximal  flow,  i.e., 

p 

xkw  =  max  {xP  |  pePw>.  VweW  (44) 


Define  the  reduced  coordinate  system  in  the  vector  y  given  by  [cf.  (8)] 


yP  =  xp,  v  P£PW  with  p  /  Pw  and  weW, 


(45) 


and  denote  by  y^  the  vector  that  corresponds  to  x^  according  to  this  trans¬ 


formation.  Consider  the  reduced  objective  function  h^(y)  =  J(x)  [cf.  (9)] 
where  x  has  coordinates  given  by  xP  =  yP,  vpeP  with  p  j*  p  and  weW  and 


=  r 


I 


xP. 


(46) 


peP„ 


p/p„ 


Denote  D!^  and  DV^  the  first  and  second  derivatives  of  D^  evaluated 


at  x^,  and  define  the  first  derivative  length  of  a  path  p  by 


l  DJo  ,  vpeP  ,  weW, 
(i.Dep  151 


(47) 


i.e.,  1  is  the  sum  of  first  derivatives  Dj^  over  all  links  on  the  path  p. 


It  is  easily  verified  that 
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aJ(x.) 

-----  =  1  ,  V  pePw.  weW  (48) 

9xf  f 


and  that  the  gradient  of  the  reduced  objective  function  is  given  by 
3hk(yk} 


9yF 


VpePw,  w£W 


(49) 


By  differentiating  this  expression  with  respect  to  yP  we  also  find  after  a 
straightforward  calculation  the  diagonal  elements  of  the  Hessian  V2!^ 


3  yyk) 

Oyp)2 


I 

(i  ,5,)eL 


°i£  *  VP£PW,  P  ^  Pw.  wGW 


(50) 


where  L  is  the  set  of  links  that  are  traversed  by  either  the  path  p  or  the 

_£ _ 

path  p^  but  not  both.  In  view  of  our  assumption  D^(f.^)  >  0  for  all  f  ^  ^  0 
we  see  that  there  exist  scalars  yP  such  that 


P 


P 

k 


aVrk> 

(3yV 


>  Pp  >  o. 


VpePw,  P  /  Pw»  «eW 


(51) 


for  all  feasible  vectors  y^. 

We  are  now  in  a  position  to  define  the  set  in  terms  of  a  positive 
scalar  e  >0  which  remains  fixed  throughout  the  algorithm.  We  set  [cf. 
(20)- (22).  (49)- (51) ] 

=  (P  I  0  £  y£  1  1  >1  .  PePw>  P  Pw»  Wew)  (52) 

where 

e£  =  min  (e,  sp}  (53) 


and 
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i  =  K  -  i*k  -  ^(iP  -  ^  )1+|  *  vptfw,  P  *  pw.  w, 


An  equivalent  definition  is  that  a  path  p  belongs  to  1^  if  it  has  a  larger 
first  derivative  length  than  the  corresponding  reference  path  pw>  and  it 
carries  flow  that  is  less  or  equal  to  both  e  and  -i,  ).  As  will  be 

seen  later  the  algorithm  "tries"  to  set  the  flow  of  these  paths  to  zero 
[cf.  (57), (69)]. 

Phase  2:  (Computation  of  the  search  direction) 

As  in  the  previous  section  we  form  a  partition  of  the  vector  y  [cf.  (24)] 


where  y  is  the  vector  of  path  flows  yP  with  pel^  and  y  is  the  vector  of 
path  flows  yp  with  pjtl^ .  The  search  direction  d^,  partitioned  consistently 


with  (55) 


is  defined  as  follows  [cf.  (25),  (26)].  For  paths  pel^  we  set 


dl  =  'Uk(1p"1pw)'  vpElk 


i.e.  the  matrix  H,  of  (25)  is  set  to  the  diagonal  matrix  with  elements 
2  K 
3  \(yk) 

- = —  along  the  diagonal. 

OyP) 

For  paths  ptly  the  search  direction  is  defined  by 


«kdk  =  -«i 


1 
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where  is  the  gradient  of  with  respect  to  y  having  coordinates 

(1-1  ),  [cf.  (49)]  and  H,  is  the  Hessian  matrix  of  h.  with  respect  to  y. 

F  Pw  K  k 

This  equation  will  be  solved  (perhaps  approximately)  by  means  of  the  con¬ 
jugate  gradient  method  described  in  the  previous  section  [cf.  equations 
(31)- (35)].  As  scaling  matrix  in  (32)  and  (35)  we  will  choose  the 
diagonal  matrix  with  diagonal  elements  the  scalars  yj^>  p£l*,  P  t  Pw»  weW, 
given  by  (50)  and  (51).  From  equations  (31)- (35)  it  is  evident  that  the 
only  difficult  part  in  implementing  the  conjugate  gradient  iteration  lies 
in  computing  vectors  of  the  form 


v  =  H^Ay  (59) 

where  Ay  is  any  vector  of  dimension  equal  to  the  number  of  paths  p  with 
P^Ik  and  P  t  Pw«  weW.  There  are  two  such  vectors  to  be  computed  at  each 
iteration,  the  vector  H^^  appearing  in  the  residual  equation 


r 

m 


V.  *  8k 


[cf.  (33)],  and  the  vector  H^Pm  appearing  in  the  stepsize  equation 


[cf.  (34)].  However  it  is  important  to  note  that  only  one  of  these  vectors 
(the  vector  H^pm)  needs  to  be  computed  by  solution  of  an  equation  such  as 
(59)  at  each  conjugate  gradient  iteration. 

Indeed  this  iteration  has  the  form  [cf  (31)] 


and  therefore  we  have 

*Vo  ■  »•  V.1  -  V«  *  V*k>V  "  • 

Hence  the  vector  can  be  computed  from  the  previous  vector  H^zm  and 

the  vector  H^pm«. 

A  key  fact  is  that  in  order  to  compute,  for  a  given  Ay»  the  vector 
v  =  y  of  (59)  we  need  not  form  explicitly  the  matrix  and  multiply 

it  with  Ay.  Indeed  consider  the  following  function 

'  i  of.,*  (60) 

of  the  incremental  flow  vector  Af  and  the  corresponding  function  of  the 
reduced  incremental  path  flow  vector  Ay 

(^(Ay)  =  Gk(EAx)  (61) 

obtained  via  the  transformation 

Af  =  E  Ax  (62) 


[cf.  (41)]  and  the  transformation 


Ayp  =  Axp  ,  VpePw,  p£l£,  P  t  Pw»  weW, 


AxP  =  0, 


l  yP,  VweW. 


•  /*♦ 
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The  Hessian  of  the  function  is  the  same  as  the  Hessian  of  the  objective 

k  k 

function  D  evaluated  at  the  flow  vector  f  corresponding  to  x  ,  and  con¬ 
sequently  the  Hessian  of  the  function  with  respect  to  the  vector  y  is 
equal  to  the  matrix  H^.  For  any  vector  Ay  the  vector  v  =  H^Ay  is  there¬ 
fore  equal  to 

V  =  Vy  =  Wfcfoy)-  (66) 

On  the  other  hand  we  have  already  shown  how  to  compute  the  gradient  of 
functions  such  as  [cf.  (47) -(49)].  The  procedure  consists  of  finding 
the  incremental  flow  vectors  Af. 0  corresponding  to  Ay  according  to  (62) -(64) 
and  (65)  and  forming  the  products  DV^Af-^  for  each  link.  Then  the  coordinates 
of  the  vector  v  of  (66)  are  given  by  [cf.  (48),  (49)]. 


,  L_  DU  fu 


(i,£)ep 


(i»£)  ep,. 


VpePu,  p£Iv,  P  *  P  .  w£W.  (67) 


Thus  the  products  H^^  and  H^pm  appearing  in  the  basic  iteration  of 
the  conjugate  gradient  method  (31) -(35)  can  be  calculated  by  the  procedure 
described  above  without  the  need  to  compute  or  store  the  matrix  H^.  Since 
all  other  operations  in  (31) -(35)  require  either  the  formation  of  inner 
products  of  vectors  or  the  multiplication  of  a  vector  with  a  diagonal 
matrix  it  can  be  seen  that  the  Newton-like  method  can  be  implemented  via 
the  conjugate  gradient  method  by  graph-type  operations  and  without  explicit 
computation  or  storage  of  any  Hessian  matrix. 

In  a  practical  implementation  of  the  algorithm  one  should  not  try  to 
solve  the  system  (58)  exactly  at  each  iteration  since  this  typically 
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requires  a  large  number  of  iterations  of  the  conjugate  gradient  method. 

Rather  one  should  terminate  the  conjugate  gradient  iterations  according  to 
some  criterion.  Some  possible  criteria  are  as  follows: 

a)  Terminate  after  a  fixed  number  of  conjugate  gradient  iterations. 

b)  Terminate  at  an  iteration  m  if  the  residual  r  satisfies 

m 

IrJ  <  B^rJ  (68) 

where  is  some  scalar  factor  less  than  unity  which  may  depend  on  the 
iteration  index  k. 

=  -g^  exactly  and  yields 
Newton's  method.  Thus  if  3^  -*■  0  one  expects  that  it  is  possible  to  construct 
a  method  that  realizes  the  super linear  convergence  rate  of  Newton's  method 
by  making  use  of  a  proper  method  for  choosing  the  stepsize  a^.  (A  result ^of 
this  type  is  shown  for  the  unconstrained  Newton's  method  in  [36]). 

Phase  3:  (Determination  of  the  stepsize  a^) 

As  usual  in  Newton-like  methods,  we  first  try  a  unity  stepsize  and 
subsequently  reduce  it  if  certain  conditions  are  not  satisfied.  Thus  we 
form  the  vector 

yk*l  =  lyk  +  dk1  +  (69) 


Taking  =  0  means  solving  the  system  H^Ay, 


where  is  the  search  direction  obtained  in  the  previous  phase.  This 
vector  may  not  lead  to  a  feasible  path  flow  vector  since  any  one  of  the 
constraints 


r  - 


w 


l 

P£Pw 

P*PW 


>  0 


VweW 


(70) 
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may  be  violated  (particularly  when  far  from  the  solution).  In  this  case 
the  stepsize  should  be  adjusted  so  that  these  constraints  are  satisfied. 
This  can  be  done  by  considering  the  vector 

yk(a)  =  [yk  +  adkl+,  ,  a  >  0  (71) 

and  find  the  largest  stepsize  ct^  for  which  all  the  constraints 

y  yf(a)  <  r  ,  wcW  (72) 

pePw 

P^PW 

are  satisfied.  The  simplest  way  to  determine  is  to  compute  for  each 

— w 

OD  pair  w  the  largest  stepsize  for  which  (72)  is  satisfied  and  obtain 
by  means  of  the  equation 

\  =  min  |  weW}.  (73) 

One  may  then  successively  reduce  the  value  of  by  multiplication  by  a 
factor  less  than  unity  until  a  sufficient  reduction  of  the  objective 
function  is  effected  in  the  spirit  of  the  Armijo  rule  (see  [28]). 

The  stepsize  selection  method  described  above  can  be  rigorously  shown 
to  lead  to  convergence  of  the  resulting  algorithm  by  means  of  a  proof  which 
is  very  similar  to  one  given  in  [28],  On  the  other  hand  for  specific  ap¬ 
plications  other  stepsize  selection  methods  may  be  more  attractive  even 
though  their  theoretical  properties  may  not  be  as  reassuring.  For  example 
in  routing  problems  arising  in  communication  networks  which  are  solved  in 
real  time  it  is  difficult  to  compute  the  value  of  the  objective  function 
thereby  making  line  search  somewhat  impractical. 


I 
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It  is  also  cumbersome  to  coordinate  the  computation  of  via  (73) 
between  all  OD  pairs.  In  such  cases  it  is  easier  to  implement  other 
iterations  such  as  for  example 

yj+i  *  [yj  +  \  A*k1+’  vpePw’  p  **  pw’  weW  (74) 

and  forego  any  tests  of  reduction  of  the  function  value.  Our  computational 
experience  suggests  that  iteration  (74)  is  for  many  networks  just  as  com¬ 
putationally  efficient  as  any  other  iteration  based  on  line  search.  On 
the  other  hand  we  have  found  examples  where  iteration  (74)  does  not  con¬ 
verge  to  an  optimal  solution,  and  therefore  cannot  recommend  it  for  general 
networks.  The  subject  of  stepsize  selection  without  line  search  is  current¬ 
ly  under  investigation. 

There  are  a  number  of  convergence  and  rate  of  convergence  results 
that  one  can  show  for  the  algorithm  described  above  and  its  variations. 

The  nature  of  these  results  and  their  proofs  are  very  similar  to  those 
given  in  [28],  as  well  as  in  other  sources  [31],  [36],  and  we  will  not  give 
a  complete  account.  We  only  mention  that  it  is  possible  to  show  that  if 
the  stepsize  of  (73)  is  used,  and  if  the  algorithm  is  started  sufficient¬ 
ly  close  to  a  global  minimum  and  a  sufficiently  accurate  solution  of  the 
Newton  system  (58)  is  obtained  via  the  conjugate  gradient  method  [i.e. 
the  scalar  8^  in  (68)  is  sufficiently  small]  then  the  method  converges  to 
a  global  minimum,  and  for  all  k  the  stepsize  ct^  will  be  unity.  If  in  ad¬ 
dition  8^  -*■  0  then  the  rate  of  convergence  will  be  superlinear. 

We  finally  mention  that  in  some  cases  the  number  of  paths  in  may 
be  very  large  and  it  may  be  unwieldly  to  keep  track  of  all  the  path  flows 
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xP,  as  for  example  when  is  the  set  of  all  directed  paths  joining  OD  pair 
w.  In  this  case  typically  the  vast  majority  of  path  flows  at  the  optimum 
is  zero  and  it  is  better  to  work  with  a  small  subset  of  paths  of  each  OD 
pair  w  that  carry  positive  flow.  This  subset  is  possible  augmented  at 
each  iteration  by  a  path  of  minimum  first  derivative  length  which  is 
determined  via  a  shortest  path  computation  (see  [13],  [15],  [16]). 
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